Essentials for Web Scrapping with R

Ayush Patel

Why are you here

There is stuff you want from web pages.

You know R.

Here you can learn how to use R to get stuff from web page.

Concepts you need to know before hand

  • What are lists, vectors and data frames
  • How to subset these
  • How to use a function
  • Use the pipe operator ( %>% or |>)

How are we going about learning this

  • Learn some absolute essentials of HTML1
  • Do a quick silly scrapping exercise using {rvest} functions
  • Learn essential functions of {rvest}
  • A detour to writing simple functions and iterating using {purrr} functions
  • Learn foundations of {Rselenium} for browser automation

What is HTML?

A markup language. What is a markup language?

Structure of the webpage

Has Elements. What are elements?

Elements tells browser how to display content

This is how html looks 1.

<html>
<head>
<title>Page Title</title>
</head>
<body>

<h1>My First Heading</h1>
<p>My first paragraph.</p>

</body>
</html>

What are Elements ?

Below is an HTML Element.

<tag>stuff to be displayed on the webpage </tag>

Starts with <tag>

Ends with </tag>

Elements can be nested, meaning you can have one element within another

Examples can be headings, paragraphs, lists etc

There can be multiple elements of the same type in a web page. How would you identify different elements of the same type? Each element will have a unique xpath or both.

What are Attributes?

Elements of same type, say a heading, can have different attributes.

Consider this example

<h3 style="color:blue;text-align:center">This is a header</h3>
<h3 style="color:red;text-align:right">This is a header</h3>

This is a header

This is a header

The Family

HTML code

<p>
This is the random stuff that comes out of my brain. It has no meaning. How many tomatoes are there in ketchup bottle.

  <ul>
    <li>potatoes, but potatoes have eyes
      <li>Stop this madness</li>
    </li>
    <li>tomatoes, do tomatoes have toes??</li>
  </ul>

</p>

HTML output
This is the random stuff that comes out of my brain. It has no meaning. How many tomatoes are there in ketchup bottle.

  • potatoes, but potatoes have eyes
    • Stop this madness
  • tomatoes, do tomatoes have toes??

The Family

  • The Ancestors
  • The Descendants
  • The Parent & Child
  • The siblings

This is all you need to know about HTML

Reckless coding detour

Here we will try to scrap from a webpage without really knowing the details of the functions.

Take good look at this website

Here is what we want:

  • Get names of all the characters on this page
  • Get Description of all these characters as well
  • Create a data frame with columns name and description

Reckless coding continued

Load libraries

library(dplyr)
library(rvest)

Run this command, is returns the html code of the webpage

read_html("put the url of the webpage here")

Use CSS selector or the inspect element in browser to find element of character names

read_html("put the url of the webpage here") %>% 
  html_elements("put desired element's name here")

check output, not exactly what we want is it?

Reckless coding continued

The output

{xml_nodeset (37)}
 [1] <h3 class="display-view">A brand of Warner Bros. Inc. It began as a cart ...
 [2] <h3>\n                  \n  \n          \n\n<dt id="js-field-label--fran ...
 [3] <h3 class="title">Barnyard Dawg</h3>
 [4] <h3 class="title">Big Chungus</h3>
 [5] <h3 class="title">Bugs Bunny</h3>
 [6] <h3 class="title">Daffy Duck</h3>
 [7] <h3 class="title">Dr. Moron</h3>
 [8] <h3 class="title">Egghead Jr.</h3>
 [9] <h3 class="title">Elmer Fudd</h3>
[10] <h3 class="title">Foghorn Leghorn</h3>
[11] <h3 class="title">Gossamer</h3>
[12] <h3 class="title">Hamton J. Pig</h3>
[13] <h3 class="title">Hector the Bulldog</h3>
[14] <h3 class="title">Henery Hawk</h3>
[15] <h3 class="title">Hippety Hopper</h3>
[16] <h3 class="title">Hugo the Abominable Snowman</h3>
[17] <h3 class="title">K-9</h3>
[18] <h3 class="title">Lola Bunny</h3>
[19] <h3 class="title">Marvin The Martian</h3>
[20] <h3 class="title">Melissa Duck</h3>
...

Add this to the code

read_html("put the url of the webpage here") %>% 
  html_elements("put desired element's name here") %>% 
  html_text()

Somethings is not just right yet

 [1] "A brand of Warner Bros. Inc. It began as a cartoon series which spawned a number of tie-ins including video games."
 [2] "\n                  \n  \n          \n\nSummaryShort summary describing this franchise.\n            "             
 [3] "Barnyard Dawg"                                                                                                     
 [4] "Big Chungus"                                                                                                       
 [5] "Bugs Bunny"                                                                                                        
 [6] "Daffy Duck"                                                                                                        
 [7] "Dr. Moron"                                                                                                         
 [8] "Egghead Jr."                                                                                                       
 [9] "Elmer Fudd"                                                                                                        
[10] "Foghorn Leghorn"                                                                                                   
[11] "Gossamer"                                                                                                          
[12] "Hamton J. Pig"                                                                                                     
[13] "Hector the Bulldog"                                                                                                
[14] "Henery Hawk"                                                                                                       
[15] "Hippety Hopper"                                                                                                    
[16] "Hugo the Abominable Snowman"                                                                                       
[17] "K-9"                                                                                                               
[18] "Lola Bunny"                                                                                                        
[19] "Marvin The Martian"                                                                                                
[20] "Melissa Duck"                                                                                                      
[21] "Miss Prissy"                                                                                                       
[22] "Nasty Canasta"                                                                                                     
[23] "O'Mike"                                                                                                            
[24] "O'Pat"                                                                                                             
[25] "Penelope Pussycat"                                                                                                 
[26] "Pepe Le Pew"                                                                                                       
[27] "Petunia Pig"                                                                                                       
[28] "Porky Pig"                                                                                                         
[29] "Ralph Wolf"                                                                                                        
[30] "Road Runner"                                                                                                       
[31] "Sam Sheepdog"                                                                                                      
[32] "Speedy Gonzales"                                                                                                   
[33] "Top contributors to this wiki"                                                                                     
[34] "Pick a List"                                                                                                       
[35] "Comment and Save"                                                                                                  
[36] "      Thanks, we're checking your submission.\n  "                                                                 
[37] ""                                                                                                                  

try using h3.title instead. Can you guess what happened? Save the output as an object, say char_name

Getting the Character Description

A problem you will face:

  • the p element is not just used for the description of the characters in this webpage.

So, what to do:

  • Get more precise. See the xpath for each description

Do you see a pattern?

# function to get the description

get_description <- function(xp){
  
  read_html("https://www.giantbomb.com/looney-tunes/3025-714/characters/") |>
  html_element(xpath = xp) |> 
  html_text()
  
}

# generate all xpaths that you want

vec_all_xpaths <- paste0('//*[@id="wiki-3025-714-characters"]/ul[1]/li[',c(1:30),']/a/p')

# get description for all characters

char_description <- purrr::map_chr(.x = vec_all_xpaths,.f = get_description)

# create the data frame

tibble::tibble(
  character = char_name,
  description = char_description 
)

Details of Rvest functions

read_html() 1

Required Input: The URL of the webpage as string. Can be other things literal xml or html


Output: html code. The class of the output is usually xml_document, xml_node


This function is used to get all the details(html code) of a webpage. This output can be further used to extract desired parts

read_html example

rvest::read_html("https://www.giantbomb.com/looney-tunes/3025-714/characters/") -> looney_page

looney_page
{html_document}
<html lang="en" class="no-js no-touch ">
[1] <head>\n<meta http-equiv="Content-Type" content="text/html; charset=UTF-8 ...
[2] <body id="default-body" class="body--legacy wiki_object col-2-template "  ...

looney_page has the html code for the webpage from exercise.

html_elements() or html_element() 1

Required Input (x): A document, node or nodes. Like looney_page


Required Input (css or xpath): Either of CSS selector value or the xpath of the desired element(s). Like h3.title or //*[@id="wiki-3025-714-characters"]/ul[1]/li[1]/a/h3


Output: Finds the elements specified elements and returns. Class of the output is usually xml_node or xml_nodeset.


html_elements() or html_element() example

looney_page |>
  rvest::html_elements("h3.title") -> looney_chars

looney_chars
{xml_nodeset (30)}
 [1] <h3 class="title">Barnyard Dawg</h3>
 [2] <h3 class="title">Big Chungus</h3>
 [3] <h3 class="title">Bugs Bunny</h3>
 [4] <h3 class="title">Daffy Duck</h3>
 [5] <h3 class="title">Dr. Moron</h3>
 [6] <h3 class="title">Egghead Jr.</h3>
 [7] <h3 class="title">Elmer Fudd</h3>
 [8] <h3 class="title">Foghorn Leghorn</h3>
 [9] <h3 class="title">Gossamer</h3>
[10] <h3 class="title">Hamton J. Pig</h3>
[11] <h3 class="title">Hector the Bulldog</h3>
[12] <h3 class="title">Henery Hawk</h3>
[13] <h3 class="title">Hippety Hopper</h3>
[14] <h3 class="title">Hugo the Abominable Snowman</h3>
[15] <h3 class="title">K-9</h3>
[16] <h3 class="title">Lola Bunny</h3>
[17] <h3 class="title">Marvin The Martian</h3>
[18] <h3 class="title">Melissa Duck</h3>
[19] <h3 class="title">Miss Prissy</h3>
[20] <h3 class="title">Nasty Canasta</h3>
...

looney_chars has the characters from the webpage, but not it text or string format.

looney_page |>
  rvest::html_elements(xpath = '//*[@id="wiki-3025-714-characters"]/ul[1]/li[1]/a/h3')
{xml_nodeset (1)}
[1] <h3 class="title">Barnyard Dawg</h3>

html_text()1

html_elements() get what we want, just not how we want it.

This is where html_text() can help.

Required Input (x): A document, node or nodes. Like looney_chars

Other Inputs (trim) : Remove spaces from the beginning and end

Returns a character vector.

html_text() example

looney_chars|>
  rvest::html_text() -> looney_chars_text

looney_chars_text
 [1] "Barnyard Dawg"               "Big Chungus"                
 [3] "Bugs Bunny"                  "Daffy Duck"                 
 [5] "Dr. Moron"                   "Egghead Jr."                
 [7] "Elmer Fudd"                  "Foghorn Leghorn"            
 [9] "Gossamer"                    "Hamton J. Pig"              
[11] "Hector the Bulldog"          "Henery Hawk"                
[13] "Hippety Hopper"              "Hugo the Abominable Snowman"
[15] "K-9"                         "Lola Bunny"                 
[17] "Marvin The Martian"          "Melissa Duck"               
[19] "Miss Prissy"                 "Nasty Canasta"              
[21] "O'Mike"                      "O'Pat"                      
[23] "Penelope Pussycat"           "Pepe Le Pew"                
[25] "Petunia Pig"                 "Porky Pig"                  
[27] "Ralph Wolf"                  "Road Runner"                
[29] "Sam Sheepdog"                "Speedy Gonzales"            

html_table()1

Along with text, there are tables on webpages that we want.

Required Input (x): A document, node or nodes. It expects the outputs of either of read_html, html_elements

Other Inputs (header): Sets first row as the column names when set to TRUE, if set to NA it will use the first row as header if there is a

tag

Other Inputs (trim) : Remove spaces from the beginning and end

Other Inputs (dec) : Which character to use as a decimal. Some countries have , as a decimal.

Returns a tibble or list of tibbles if applied on multiple elements

html_table() example

Go to Department of Expenditure’s contact details page.

We want the tables on this page.

rvest::read_html("https://doe.gov.in/whos-who")|>
  rvest::html_elements("table")|>
  rvest::html_table()
[[1]]
# A tibble: 10 × 9
   Name            Desig…¹ Divis…² Telep…³ Telep…⁴ Inter…⁵ Room …⁶ Addre…⁷ Email
   <chr>           <chr>   <lgl>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>
 1 Smt Nirmala Si… Financ… NA      230925… "23793… ""      134, N… "15, S… "app…
 2 Shri S.S. Nakul Privat… NA      230925… ""      ""      136-A,… ""      ""   
 3 Shri Vivek Sin… OSD to… NA      230925… ""      "5676,… 137A, … ""      ""   
 4 Shri Ankit Jal… Addl. … NA      230925… ""      "5676,… 142-A,… ""      "fmo…
 5 Shri B.N. Bhas… Addl. … NA      230925… ""      "5676,… 137-A,… ""      ""   
 6 Shri Karma Son… Addl. … NA      230925… ""      ""      137-A,… ""      ""   
 7 Shri Sernya Bh… 1st PA… NA      230925… ""      ""      136, N… ""      ""   
 8 Shri Anil Yadav Under … NA      230925… ""      "5676,… 135, N… ""      ""   
 9 Shri Ashok Raw… PPS     NA      230925… ""      "5676,… 142-A,… ""      ""   
10 Shri Ram Rasik… Under … NA      230925… ""      ""      167-A,… ""      ""   
# … with abbreviated variable names ¹​Designation, ²​`Division/section`,
#   ³​`Telephone (Office)`, ⁴​`Telephone(Residence)`, ⁵​`Intercom No.`,
#   ⁶​`Room No.`, ⁷​`Address(Residence)`

[[2]]
# A tibble: 10 × 9
   Name            Desig…¹ Divis…² Telep…³ Telep…⁴ Inter…⁵ Room …⁶ Addre…⁷ Email
   <chr>           <chr>   <chr>   <chr>   <chr>   <lgl>   <chr>   <chr>   <chr>
 1 Shri Pankaj Ch… Minist… ""      "23093… ""      NA      "138, … ""      "mos…
 2 Shri Kumar Rav… PS to … ""      "23093… ""      NA      "North… ""      ""   
 3 Shri Alkesh Ut… Addl. … ""      "23093… ""      NA      "142, … ""      ""   
 4 Shri Gaurav Sh… US      ""      "23093… ""      NA      "144-A… ""      ""   
 5 Sh. Neeraj Mis… APS to… "MoS"   "23093… ""      NA      ""      ""      ""   
 6 Sh. Dhruv Nara… Ist PA… "MoS"   "23093… ""      NA      ""      ""      ""   
 7 MOS Finance     MOS Fi… ""      ""      ""      NA      ""      ""      ""   
 8 Dr. Bhagwat Ka… MOS Fi… ""      "23093… "011 -… NA      "165, … "302, … "mos…
 9 Shri Amit Meena PS to … ""      "23093… ""      NA      "166A,… ""      ""   
10 Shri Shambhu K… Under … ""      "23093… ""      NA      "164, … ""      ""   
# … with abbreviated variable names ¹​Designation, ²​`Division/section`,
#   ³​`Telephone (Office)`, ⁴​`Telephone(Residence)`, ⁵​`Intercom No.`,
#   ⁶​`Room No.`, ⁷​`Address(Residence)`

[[3]]
# A tibble: 5 × 9
  Name             Desig…¹ Divis…² Telep…³ Telep…⁴ Inter…⁵ Room …⁶ Addre…⁷ Email
  <chr>            <chr>   <lgl>     <int>   <dbl> <chr>   <chr>   <chr>   <chr>
1 Dr. T. V. Soman… F.S. &… NA       2.31e7 NA      5610, … 129-A,… ""      "sec…
2 Shri S. Sudarsh… PSO     NA       2.31e7 NA      5624, … 129-C … ""      ""   
3 Shri Rakesh Kum… PPS     NA       2.31e7  9.96e9 5610, … 129-C … "A-1/2… ""   
4 Sh. Chanakya Ke… PPS     NA       2.31e7  9.87e9 5610, … 129-C,… "892, … ""   
5 Sh. Netra Pal S… Consul… NA       2.31e7  9.97e9 5624, … 129-C,… "138-A… ""   
# … with abbreviated variable names ¹​Designation, ²​`Division/section`,
#   ³​`Telephone (Office)`, ⁴​`Telephone(Residence)`, ⁵​`Intercom No.`,
#   ⁶​`Room No.`, ⁷​`Address(Residence)`

[[4]]
# A tibble: 6 × 9
  Name            Desig…¹ Divis…² Telep…³ Teleph…⁴ Inter…⁵ Room …⁶ Addre…⁷ Email
  <chr>           <chr>   <lgl>   <chr>      <dbl>   <int> <chr>   <chr>   <chr>
1 .               Additi… NA      ""      NA            NA ""      ""      ""   
2 Shri Rajesh Dh… PPS to… NA      "011-2… NA          5679 "166-E… ""      "raj…
3 Mrs. Kavita Ma… PPS     NA      "011-2… NA          5679 "166E,… "EA-25… "m[d…
4 Shri M K Sahoo  Adviser NA      "011-2… NA            NA "504, … ""      "mks…
5 Sh. Rana Mukes… PA to … NA      "011-2… NA            NA ""      ""      "ran…
6 Shri Sushobhan… SSO     NA      "011-2…  9.20e11      NA "505 1… ""      "s[d…
# … with abbreviated variable names ¹​Designation, ²​`Division/section`,
#   ³​`Telephone (Office)`, ⁴​`Telephone(Residence)`, ⁵​`Intercom No.`,
#   ⁶​`Room No.`, ⁷​`Address(Residence)`

[[5]]
# A tibble: 21 × 9
   Name            Desig…¹ Divis…² Telep…³ Telep…⁴ Inter…⁵ Room …⁶ Addre…⁷ Email
   <chr>           <chr>   <chr>   <chr>     <dbl> <chr>   <chr>   <chr>   <chr>
 1 Smt.Annie G. M… Specia… "Perso… 011-23… NA      5648    39-A, … "Bunga… "mat…
 2 Smt. Sudha Raj… PSO to… "Perso… 011-23… NA      5648    36, No… ""      "Sud…
 3 Ms. Divya Alat… Direct… "Admin" 230926… NA      5693    168-C,… ""      "div…
 4 Shri Shyam Kis… Sr. PP… "Admin" 011-23… NA      5616    169-A,… ""      "kis…
 5 Shri S.N. Rana  Under … "GAD/C… 011-23… NA      5665    56A, N… ""      "ran…
 6 Sh. Ranjit Kum… Under … "Admin… 011-23… NA      5695    225-E,… ""      "ran…
 7 Sh. K.J. Bhatt  Under … "Admn." 230957… NA      5722    225E, … ""      ""   
 8 Sh. Pijush Moh… US (Vi… ""      230956…  9.54e9 5656    231/NB  "32-A,… "pij…
 9 Sh. Ravi Kumar  SO      "GAD"   230956… NA      5621, … 56A, N… ""      ""   
10 Shri Rajeshwar… Deputy… "Offic… 011-23… NA      5620    261, N… ""      "raj…
# … with 11 more rows, and abbreviated variable names ¹​Designation,
#   ²​`Division/section`, ³​`Telephone (Office)`, ⁴​`Telephone(Residence)`,
#   ⁵​`Intercom No.`, ⁶​`Room No.`, ⁷​`Address(Residence)`
# ℹ Use `print(n = ...)` to see more rows

[[6]]
# A tibble: 36 × 9
   Name            Desig…¹ Divis…² Telep…³ Telep…⁴ Inter…⁵ Room …⁶ Addre…⁷ Email
   <chr>           <chr>   <chr>   <chr>   <lgl>   <chr>   <chr>   <chr>   <chr>
 1 Smt..Annie G. … Specia… Person… 230932… NA      "5648"  39-A, … "Bunga… "mat…
 2 Smt. Sudha Raj… PSO to… Person… 011-23… NA      "5648"  36 Nor… ""      "Sud…
 3 Sh. THANGLEMLI… Joint … E.Coor… 230932… NA      "5690"  74 -B/… ""      "THA…
 4 Smt. Nirmala D… Direct… EG      230932… NA      "5623"  37, NB  ""      "n[d…
 5 Sh. Avinash K … Deputy… E.II -A 230926… NA      "5609"  48 -E,… ""      "avi…
 6 Sh. Umesh Kuma… Deputy… E. III… 011-23… NA      "5715"  225-D,… ""      "uk[…
 7 Sh. B. Sengupta D.S.    E.III-… 230927… NA      "5723"  76 -A/… ""      ""   
 8 Sh. B.K. Manth… D.S.    E.III-… 230945… NA      "5669"  74-C/NB ""      ""   
 9 Shri Ram Gopal  Deputy… E.III B 230922… NA      "5726"  30A, N… ""      ""   
10 Shri R.D. Talu… D.S     EMC     246279… NA      ""      502/LNB ""      ""   
# … with 26 more rows, and abbreviated variable names ¹​Designation,
#   ²​`Division/section`, ³​`Telephone (Office)`, ⁴​`Telephone(Residence)`,
#   ⁵​`Intercom No.`, ⁶​`Room No.`, ⁷​`Address(Residence)`
# ℹ Use `print(n = ...)` to see more rows

[[7]]
# A tibble: 14 × 9
   Name            Desig…¹ Divis…² Telep…³ Telep…⁴ Inter…⁵ Room …⁶ Addre…⁷ Email
   <chr>           <chr>   <chr>   <chr>     <dbl>   <int> <chr>   <chr>   <chr>
 1 Smt.Annie G. M… Specia… "Perso… 011-23… NA         5648 "39-A,… "Bunga… "mat…
 2 Smt. Sudha Raj… PSO to… "Perso… 011-23… NA         5648 "36, N… ""      "Sud…
 3 Sh. V. Padmana… D.S.    "RTI &… 246177… NA           NA "501/L… ""      ""   
 4 Smt. Pratima G… DDG     "Exp."  246537… NA           NA "515,L… ""      ""   
 5 Sh. Shiv Ram M… Direct… "SIU"   246975… NA           NA "505/L… ""      ""   
 6 Sh. Kailash Ch… Under … "SIU"   246110… NA           NA "503/L… ""      ""   
 7 Sh.........     Under … ""      246186… NA           NA "506/L… ""      ""   
 8 Sh. Devinder K… Under … "RTI &… 246545…  8.08e9      NA "504,L… ""      ""   
 9 Smt. Uma Aggar… SO      ""      246189… NA           NA ""      ""      ""   
10 Smt. Kavita Sa… PS      ""      246189… NA           NA "508,L… ""      ""   
11 Sh. Lalit Kumar SO      ""      246189… NA           NA "508,L… ""      ""   
12 Sh. Raj Kumar   SO      "RTI"   246545…  8.45e9      NA "511, … ""      ""   
13 Sh.Santosh Kum… Careta… ""      246189… NA           NA "508/L… "E-163… ""   
14 Sh. Rajesh Sha… SO      "SIU"   246545…  9.01e9      NA "511, … ""      ""   
# … with abbreviated variable names ¹​Designation, ²​`Division/section`,
#   ³​`Telephone (Office)`, ⁴​`Telephone(Residence)`, ⁵​`Intercom No.`,
#   ⁶​`Room No.`, ⁷​`Address(Residence)`

[[8]]
# A tibble: 1 × 9
  Name             Desig…¹ Divis…² Telep…³ Telep…⁴ Inter…⁵ Room …⁶ Addre…⁷ Email
  <chr>            <chr>   <chr>     <int> <lgl>   <lgl>   <chr>   <lgl>   <lgl>
1 Ms. Gurpreet Ka… SSO     PRU      2.46e7 NA      NA      511, L… NA      NA   
# … with abbreviated variable names ¹​Designation, ²​`Division/section`,
#   ³​`Telephone (Office)`, ⁴​`Telephone(Residence)`, ⁵​`Intercom No.`,
#   ⁶​`Room No.`, ⁷​`Address(Residence)`

[[9]]
# A tibble: 13 × 9
   Name            Desig…¹ Divis…² Telep…³ Telep…⁴ Inter…⁵ Room …⁶ Addre…⁷ Email
   <chr>           <chr>   <chr>   <chr>   <lgl>   <chr>   <chr>   <chr>   <chr>
 1 Shri Amit Sing… Joint … "PFC-I" 230948… NA      5652    142-B,… ""      "neg…
 2 Sh. L.K. Trive… Direct… "PFC.I" 230933… NA      5642    264 C,… ""      "lk[…
 3 Ms Swayamprava… Direct… "PFC.I" 230926… NA      5636    225-C,… ""      "swa…
 4 P. Parthiban    Deputy… "PFC I" 011230… NA      5645    167 B,… ""      "p[d…
 5 Ms. Shalaka Ku… Dy. Di… "PFC.I" 230956… NA      5664    79, NB  ""      ""   
 6 CA Ranganath A… Deputy… ""      011-23… NA      5696    R No. … "Db001… "ran…
 7 Sh. Partha Paul US      "PFC.I" 230956… NA      5643    77,NB   ""      ""   
 8 Sh. Krishnakan… S.O-PF  "PFC-I" 230956… NA      5622, … 65,NB   ""      ""   
 9 Sh. Mangal Pra… S.O.    "PFC-I" 230956… NA      5651    77/NB   ""      ""   
10 Rajesh Vermani  S.O.    "PFC-I" 230956… NA      5651    77,NB   ""      ""   
11 Sh............… S.O.    "PFC-I" 230956… NA      5622    65      ""      ""   
12 Ms. Subha Vija… S.O.    "PFC-I" 230956… NA      5622    65/NB   ""      ""   
13 Ms. Aruna Arora Asstt.… "PFC-I" 230956… NA      5622, … 65/NB   ""      ""   
# … with abbreviated variable names ¹​Designation, ²​`Division/section`,
#   ³​`Telephone (Office)`, ⁴​`Telephone(Residence)`, ⁵​`Intercom No.`,
#   ⁶​`Room No.`, ⁷​`Address(Residence)`

[[10]]
# A tibble: 12 × 9
   Name            Desig…¹ Divis…² Telep…³ Telep…⁴ Inter…⁵ Room …⁶ Addre…⁷ Email
   <chr>           <chr>   <chr>     <int>   <dbl> <chr>   <chr>   <chr>   <chr>
 1 Dr. Sajjan Sin… "Addit… "PF St…  2.31e7 NA      "5673"  169-C,… ""      ""   
 2 Sh Ravinder Ku… "Sr. P… "PF St…  2.31e7 NA      "5682"  143, N… "."     ""   
 3 .               "PPS t… "PF St…  2.31e7 NA      "5682"  143, NB ""      ""   
 4 Sh. Prateek Ku… "Direc… "PFC.I"  2.31e7 NA      "5660"  76, No… ""      "pra…
 5 Sh Deependra K… "Direc… "PF St…  2.31e7  9.45e9 "5612"  145,NB  ""      "kum…
 6 Shri G. S. Ana… "Direc… "PF-St…  2.31e7 NA      "5691"  162,NB  ""      "gsa…
 7 Ms. Anjali Mau… "Assis… "PF St…  2.31e7 NA      "5697"  80, No… ""      "mau…
 8 Sh. Rabi Ranjan "Deput… "PFC.I"  2.31e7 NA      "5672"  264,NB  ""      ""   
 9 Vacant          "Asstt… "PF St… NA      NA      ""      79,Nor… ""      ""   
10 Shri Sumit Aga… "Deput… "PFS"    2.31e7 NA      "5700"  79, NB  ""      "agr…
11 PF (State)      ""      ""       2.31e7 NA      "5625,… 80,NB   ""      ""   
12 Sh.             "AAO"   "PF-St…  2.31e7 NA      "5726"  30 A    ""      ""   
# … with abbreviated variable names ¹​Designation, ²​`Division/section`,
#   ³​`Telephone (Office)`, ⁴​`Telephone(Residence)`, ⁵​`Intercom No.`,
#   ⁶​`Room No.`, ⁷​`Address(Residence)`

[[11]]
# A tibble: 7 × 9
  Name             Desig…¹ Divis…² Telep…³ Telep…⁴ Inter…⁵ Room …⁶ Addre…⁷ Email
  <chr>            <chr>   <chr>     <int> <lgl>     <int> <chr>   <lgl>   <chr>
1 Shri Sanjay Pra… Addl. … "PF Ce…  2.31e7 NA         5720 161,NB  NA      "js[…
2 Sh. Tilak Raj G… PPS     ""       2.31e7 NA         5698 163,NB  NA      ""   
3 Mr. Amit Kumar   Jt. Dir "PFC -…  2.31e7 NA         5688 162, NB NA      ""   
4 Ms. Hema Jaiswal Dir.    ""       2.31e7 NA         5614 167-B,… NA      ""   
5 Sh. Puspendra S… Dy. Di… "PFC.I…  2.31e7 NA         5640 80, NB  NA      "pus…
6 Sh. Rangin Murmu Dy. Di… "PFC-I…  2.31e7 NA         5701 79, NB  NA      ""   
7 Sh. Aayush Bans… Deputy… "PFC-I…  2.31e7 NA         5644 79, No… NA      "abc…
# … with abbreviated variable names ¹​Designation, ²​`Division/section`,
#   ³​`Telephone (Office)`, ⁴​`Telephone(Residence)`, ⁵​`Intercom No.`,
#   ⁶​`Room No.`, ⁷​`Address(Residence)`

[[12]]
# A tibble: 5 × 9
  Name             Desig…¹ Divis…² Telep…³ Telep…⁴ Inter…⁵ Room …⁶ Addre…⁷ Email
  <chr>            <chr>   <chr>     <int>   <dbl>   <int> <chr>   <chr>   <chr>
1 Shri Manoj Sahay Additi… ""       2.31e7  9.97e9    5685 166-C,… "C II/… "man…
2 Sh. Surendra Ku… PPS to… ""       2.31e7 NA         5603 163,NB  ""      ""   
3 Ms. A. Seetha M… PPS to… "Fin/M…  2.31e7 NA         5603 163, NB ""      ""   
4 Sh. Deepak Math… Dy. Se… "Rev./…  2.31e7 NA         5401 71-A,/… ""      ""   
5 Shri Nitin Kumar S.O.    "MD"     2.31e7 NA         5714 276-C,… ""      ""   
# … with abbreviated variable names ¹​Designation, ²​`Division/section`,
#   ³​`Telephone (Office)`, ⁴​`Telephone(Residence)`, ⁵​`Intercom No.`,
#   ⁶​`Room No.`, ⁷​`Address(Residence)`

[[13]]
# A tibble: 6 × 9
  Name             Desig…¹ Divis…² Telep…³ Telep…⁴ Inter…⁵ Room …⁶ Addre…⁷ Email
  <chr>            <chr>   <chr>     <int>   <dbl>   <int> <chr>   <chr>   <chr>
1 Shri Manoj Sahay Additi… ""       2.31e7  9.97e9    5685 166-C,… "C II/… "man…
2 Sh. Surendra Ku… PPS to… ""       2.31e7 NA         5603 163,NB  ""      ""   
3 Ms. A. Seetha M… PPS to… "Fin/M…  2.31e7 NA         5603 163, NB ""      ""   
4 Sh. Deepak Math… Dy. Se… "Rev./…  2.31e7 NA         5401 71-A,NB ""      ""   
5 Sh.Arvind Kumar… US      "IFU"    2.31e7 NA         5607 225-E/… ""      ""   
6 Ms. Preeti Shar… S.O     "IFU"    2.31e7 NA         5662 241-D,… ""      ""   
# … with abbreviated variable names ¹​Designation, ²​`Division/section`,
#   ³​`Telephone (Office)`, ⁴​`Telephone(Residence)`, ⁵​`Intercom No.`,
#   ⁶​`Room No.`, ⁷​`Address(Residence)`

[[14]]
# A tibble: 2 × 9
  Name             Desig…¹ Divis…² Telep…³ Telep…⁴ Inter…⁵ Room …⁶ Addre…⁷ Email
  <chr>            <chr>   <chr>     <int> <lgl>   <lgl>     <int> <chr>   <lgl>
1 Sh. Subrata Cha… US      MC       2.46e7 NA      NA          139 2nd Fl… NA   
2 Sh. Vishwa Nath… SO      MC       2.46e7 NA      NA           NA 2nd Fl… NA   
# … with abbreviated variable names ¹​Designation, ²​`Division/section`,
#   ³​`Telephone (Office)`, ⁴​`Telephone(Residence)`, ⁵​`Intercom No.`,
#   ⁶​`Room No.`, ⁷​`Address(Residence)`

[[15]]
# A tibble: 6 × 9
  Name             Desig…¹ Divis…² Telep…³ Telep…⁴ Inter…⁵ Room …⁶ Addre…⁷ Email
  <chr>            <chr>   <chr>     <int> <lgl>     <int> <chr>   <lgl>   <lgl>
1 Dr. Sajjan Sing… Additi… "FCD"    2.31e7 NA         5673 169-C   NA      NA   
2 Sh. Ravinder Ku… Sr. PP… "FCD"    2.31e7 NA         5682 143,NB  NA      NA   
3 Mrs. Poonam Chh… PPS to… "FCD"    2.31e7 NA         5682 166-E,… NA      NA   
4 Shri Abhay Kumar Direct… "FCD"    2.44e7 NA           NA .503,C… NA      NA   
5 Sh. Rajendra Ku… PPS to… ""       2.44e7 NA           NA 502     NA      NA   
6 Shri Mahesh Kum… Deputy… ""       2.44e7 NA           NA 508     NA      NA   
# … with abbreviated variable names ¹​Designation, ²​`Division/section`,
#   ³​`Telephone (Office)`, ⁴​`Telephone(Residence)`, ⁵​`Intercom No.`,
#   ⁶​`Room No.`, ⁷​`Address(Residence)`

[[16]]
# A tibble: 5 × 9
  Name             Desig…¹ Divis…² Telep…³ Telep…⁴ Inter…⁵ Room …⁶ Addre…⁷ Email
  <chr>            <chr>   <chr>     <int> <lgl>     <int> <chr>   <lgl>   <chr>
1 Shri Sanjay Agg… "Advis… PPD      2.31e7 NA         5608 168-B,… NA      "san…
2 Ms. Manju Kumari "PPS"   PPD      2.31e7 NA         5708 169-A,… NA      ""   
3 Shri Kanwalpreet "Direc… PPD      2.31e7 NA         5683 264C/NB NA      ""   
4 US PPD           ""      PPD      2.46e7 NA           NA 512, L… NA      ""   
5 Sh..             "SO"    PPD      2.46e7 NA           NA LNB     NA      ""   
# … with abbreviated variable names ¹​Designation, ²​`Division/section`,
#   ³​`Telephone (Office)`, ⁴​`Telephone(Residence)`, ⁵​`Intercom No.`,
#   ⁶​`Room No.`, ⁷​`Address(Residence)`

[[17]]
# A tibble: 18 × 9
   Name            Desig…¹ Divis…² Telep…³ Telep…⁴ Inter…⁵ Room …⁶ Addre…⁷ Email
   <chr>           <chr>   <chr>   <chr>   <lgl>     <int> <chr>   <lgl>   <chr>
 1 Shri Umesh Kum… Chief … Cost    "24698… NA         8522 "201"   NA      "uks…
 2 Sh. A.K. Gaur   Sr. PPS Cost    "24698… NA           NA "202"   NA      ""   
 3 Sh. Gurubaksh … PS      Cost    "24698… NA         8522 "202,L… NA      ""   
 4 Shri Amardeep … Adviser Cost    "24618… NA         8906 "204-B… NA      "ama…
 5 Shri Pankaj Gu… Adviser Admn.   "24698… NA         8435 "205-B" NA      "gup…
 6 Shri Rajesh Ya… Direct… Cost    "24694… NA         4021 "209-B" NA      ""   
 7 Shri Manoj Kum… Joint … Cost    "24617… NA         7075 "205-C" NA      "man…
 8 Shri T.R.Sathi… Joint … Cost    "24698… NA         8640 "205-A" NA      "sch…
 9 Shri Prakash H… Dy. Di… Cost    "24653… NA         3487 "206, … NA      "pra…
10 Ms. Priyanka S… Dy. Di… Cost    ""      NA           NA ""      NA      "pri…
11 Shri Deepak Ga… Dy. Di… Cost    "24653… NA         3487 "206, … NA      "dee…
12 Shri Devanshi … Dy. Di… Cost    "24653… NA         3487 "206, … NA      "dev…
13 Ms. R Kalyanas… Asstt.… Cost    "24692… NA         2541 "206, … NA      "cma…
14 Shri Rahul Cha… Dy. Di… Cost    "24692… NA         2541 "206, … NA      "rah…
15 Shri Manoj Kum… Dy, Di… Cost    "24653… NA         3487 "206,L… NA      "ca[…
16 Shri Harsh Jos… Asstt.… Admn.   "24692… NA         2541 "206, … NA      "har…
17 Shri Pankaj Pa… Asstt.… Cost    "24692… NA         2541 "206, … NA      "pan…
18 Shri Mahesh Ch… Sectio… Admn.   "24693… NA         3895 "208,L… NA      "sha…
# … with abbreviated variable names ¹​Designation, ²​`Division/section`,
#   ³​`Telephone (Office)`, ⁴​`Telephone(Residence)`, ⁵​`Intercom No.`,
#   ⁶​`Room No.`, ⁷​`Address(Residence)`

[[18]]
# A tibble: 5 × 9
  Name             Desig…¹ Divis…² Telep…³ Telep…⁴ Inter…⁵ Room …⁶ Addre…⁷ Email
  <chr>            <chr>   <lgl>     <int> <lgl>     <int> <chr>   <lgl>   <chr>
1 Sh. Alok Ranjan  Chief … NA       2.31e7 NA         5729 240-B,… NA      "cca…
2 Shri Vivekanand  Contro… NA       2.31e7 NA         5730 241-A,… NA      "ana…
3 Sh. Vikas Chand… DCA Fi… NA      NA      NA           NA 401, E… NA      "vc[…
4 Shri Himanshu S… ACA Fi… NA       2.31e8 NA           NA 269, N… NA      "sri…
5 Sh. Vikash Chan… Dy. Co… NA       2.31e7 NA           NA R. No.… NA      ""   
# … with abbreviated variable names ¹​Designation, ²​`Division/section`,
#   ³​`Telephone (Office)`, ⁴​`Telephone(Residence)`, ⁵​`Intercom No.`,
#   ⁶​`Room No.`, ⁷​`Address(Residence)`

[[19]]
# A tibble: 19 × 9
   Name            Desig…¹ Divis…² Telep…³ Telep…⁴ Inter…⁵ Room …⁶ Addre…⁷ Email
   <chr>           <chr>   <chr>   <chr>     <dbl> <chr>   <chr>   <chr>   <lgl>
 1 Reception offi… ""      ""      230956… NA      "5617,… ""      ""      NA   
 2 Reception offi… ""      ""      230956… NA      "5605,… ""      ""      NA   
 3 Finance Canteen ""      ""      230956… NA      "5677,… ""      ""      NA   
 4 Tea Board       ""      ""      230956… NA      "5670"  ""      ""      NA   
 5 Coffee Board    ""      ""      230939… NA      "5692"  ""      ""      NA   
 6 Bikano          ""      ""      230956… NA      "5601"  ""      ""      NA   
 7 Driver Room     ""      ""      230956… NA      "5675"  ""      ""      NA   
 8 Electrical Enq… ""      ""      230948… NA      ""      ""      ""      NA   
 9 Asstt. Enginee… ""      ""      230923… NA      ""      ""      ""      NA   
10 Sh. Vishal      "(LG R… ""      987333… NA      ""      ""      ""      NA   
11 Sh. M Krishnan  "Jr. E… "AC/El… 230939… NA      ""      ""      "96435… NA   
12 Sh. D.C. Sharma "Asst.… ""      230939…  9.87e9 ""      ""      ""      NA   
13 Sh. Manish Kum… "Execu… ""      230923…  9.87e9 ""      ""      ""      NA   
14 CPWD (Civil)    ""      ""      230920… NA      ""      ""      ""      NA   
15 CPWD (Electric… ""      ""      230935… NA      ""      ""      ""      NA   
16 Fire Station, … ""      ""      230927… NA      ""      ""      ""      NA   
17 JTO, MTNL, NB … ""      ""      230920… NA      ""      ""      ""      NA   
18 Internet Compl… ""      "NIC- … 180011… NA      ""      "35AB"  ""      NA   
19 Stationery Sto… ""      ""      230956… NA      "5658"  ""      ""      NA   
# … with abbreviated variable names ¹​Designation, ²​`Division/section`,
#   ³​`Telephone (Office)`, ⁴​`Telephone(Residence)`, ⁵​`Intercom No.`,
#   ⁶​`Room No.`, ⁷​`Address(Residence)`

[[20]]
# A tibble: 3 × 9
  Name            Desig…¹ Divis…² Telep…³ Teleph…⁴ Inter…⁵ Room …⁶ Addre…⁷ Email
  <chr>           <chr>   <chr>   <chr>      <dbl>   <int> <chr>   <chr>   <chr>
1 Sh. Rajesh Mal… Princi… "Media… 011-23…  9.87e 9    5006 B-77, … "Flat … "dpr…
2 Ms. Gurmeet Bh… Sr. PPS "Media… 011-23… NA          5637 A-76, … ""      "gur…
3 Sh. Kush Mohan… M&CO    ""      230939…  1.00e10    5637 A-76    ""      ""   
# … with abbreviated variable names ¹​Designation, ²​`Division/section`,
#   ³​`Telephone (Office)`, ⁴​`Telephone(Residence)`, ⁵​`Intercom No.`,
#   ⁶​`Room No.`, ⁷​`Address(Residence)`

There is much more to learn about {rvest}

What we have covered is sufficient as a building block of working knowledge to scrape webpages.